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Abstract 

Software quality is of primary concern in all large-scale expert system development 
efforts. Building appropriate validation and test tools for ensuring software reliability of 
expert systems is therefore required. 

The Expert Systems Validation Associate (EVA) is a validation system under 
development at the Lockheed Artificial intelligence Center. EVA provides a wide range 
of validation and test tools to check the correctness, consistency, and completeness of 
an expert system. 

Testing is a major function of EVA. It means executing an expert system with test cases 
with the intent of finding errors. In this paper, we describe many different types of testing 
such as function-based testing, structure-based testing, and data-based testing. We 
describe how appropriate test cases may be selected in order to perform good and 
thorough testing of an expert system. 


INTRODUCTION 

It has been repeatedly shown that the expert system technology in 
artificial intelligence can be used to implement many different ap- 
plications such as diagnostic systems, battle management sys- 
tems, machine and robot control systems, monitoring systems, 
design systems, manufacturing systems, etc. Regardless of 
whether the expert systems are stand-alone or real-time em- 
bedded systems, we need to be ensured that they are reliable, 
correct, consistent and complete. For this purpose, the Lockheed 
Artificial Intelligence Center started in 1986 the Expert Systems 
Validation Associate (EVA) project [Stachowitz et al. 1987a, 
1987b, 1987c]. EVA provides a wide range of validation and test 
tools to check the correctness, consistency and completeness of 
an expert system. 

Testing is a major function of EVA. It means executing an expert 
system with test cases with the intent of finding errors. A very 
good example is the Target Generation Facility (TGF) which 
provides simulated, real-time controllable aircraft targets to Air 
Traffic Control Systems under test at the FAA Technical Center. 

In this paper, we consider many different types of testing such as 
function-based testing, structure-based testing and data-based 
testing. We describe how appropriate test cases may be selected 
in order to perform good and thorough testing of an expert system. 

BACKGROUND 

Expert systems are usually developed incrementally. The initial re- 
quirements for an expert system may be clearly stated. However, 
as the expert system evolves and is evaluated, the requirements 
may be changed or new requirements may be added. In many 
cases, even if the requirements are not changed, there are no 
known algorithms for solving the problem. For example, there are 
no algorithms for performing parallel parking even though the in- 
itial and final positions of a car can be specified precisely. There- 
fore, an expert system may have to be developed in repeated 
cycles of implementation, evaluation and modification steps. In 
the parallel parking example, a fuzzy (approximate) algorithm, 
represented by rules, may be tried first. The algorithm continues to 


be modified until a satisfactory performance is achieved. The goal 
of the test case generator is to generate "appropriate" test cases 
from the requirements specifications or the expert system itself for 
users, expert collaborators, or system builders to perform the 
thorough evaluations during the development or acceptance 
phase of the system production cycle. 

Testing of conventional software [DeMillo et al. 1981, 1987, Hetzel 
1984, Miller and Howden 1984, Zeil and White 1980] has been 
known for a long time. As stated in [Hetzel 1984], software testing 
is a creative and difficult task. It requires very good knowledge 
about the system being tested. Typically, the requirements cannot 
be processed automatically, or knowledge is buried inside the 
codes of the system. Therefore, test cases are conventionally 
generated manually. This is certainly tedious and error-prone. 

On the other hand, an expert system is usually implemented in a 
high-level language that supports high-level concepts such as ob- 
jects, relations, categories, functional mappings, data types and 
data constraints. This knowledge can be used to generate test 
cases automatically. 

TYPES OF TEST CASES 

In this paper, we consider three types of test cases, namely, 
function-based test cases, structure-based test cases, and data- 
based test cases. 

To generate function-based test cases for an expert system, one 
requires knowledge about the system’s functions. Function-based 
testing is usually regarded as black box testing because it tests 
the external input-output behavior (specifications) of the system. In 
order to generate function-based test cases automatically , the 
generator must be provided with knowledge about the input-output 
specifications. 

Structure-based test cases explore the relations between rules. 
An expert system can be represented by a knowledge base con- 
sisting of facts and rules, which can be connected to make a con- 
nection graph. An arc in the connection graph denotes a match 
between a literal in the left hand side (LHS) of a rule and a literal in 
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the right hand side (RHS) of a rule. Note that a fact can be con- 
sidered as a rule without a RHS. Structure -based test cases are 
based upon the structure of the connection graph. The idea is to 
generate a set of test cases to exercise every rule in the connec- 
tion graph at least once. 

The difference between function-based and structure-based test- 
ing can be illustrated by using an electrical circuit: Function-based 
testing means checking whether the light goes on when we throw 
the switch, while structure -based testing means inspecting 
whether all parts are connected properly into the circuit. To per- 
form function-based testing, we do not need to look inside the cir- 
cuit box. Therefore, it is called black box testing. On the other 
hand, since we need to look inside the circuit box to see how parts 
are connected, structure-based testing is called white box test- 
ing. 

Data-based test cases are based upon data definitions for the ex- 
pert system. The data definitions consist of data declarations and 
data constraints. The data declarations are schema statements 
for data domains, relations and objects. A data constraint is 
specified by a logic formula using object-level and/or meta-level 
predicates. 

FUNCTION-BASED TESTING 

Input data to an expert system are usually represented by facts 
that are instances of schemas. Let us call these schemas input 
schemas. Each test case contains a set of facts of the input 
schemas. For each set of input facts, the expert system will 
produce a set of output facts (data), which are instance of output 
schemas. 

The input and output schemas may not be declared explicitly. 
They may be implicitly contained in the connection graph of the 
expert system. In this case, we consider only the rule part of the 
connection graph. In a connection graph, there are two kinds of 
leaf nodes, namely, input nodes and output nodes. An input node 
is a LHS-literal of a rule that is not connected to other RHS-literals. 
An output node is a node representing a RHS-literal that is not 
connected to any LHS-literals. The schemas of input and output 
nodes will be considered as the input and output schemas, 
respectively. 

In order to thoroughly cover all different types of input test cases, 
we must systematically categorize input and output data by explicit 
declarations. For each set of input facts in certain categories, we 
specify the expected output facts, or the expected categories of 
the output facts, or the data constraints that the expected output 
facts have to satisfy. 

Consider the airline inquiry system in (Hetzel 1984]. The specifica- 
tions of the system are given as follows: The inputs are 1) a trans- 
action identifying departure and destination cities and travel date, 
and 2) tables of flight information showing flights available and 
seats remaining. The system checks the flight tables for the 
desired city. If there is no flight to that city, it prints message 1 "No 
flight". If there is a flight, but seats are not available, it prints mes- 
sage 2 "Sold out". If there is a flight and seats are available, it 
displays that fact. Therefore, the expected ouput is either a flight 
display, or message 1 , or message 2. 

For this example, the relational schema is: 

flight(flight#, from_city, to_city, date, seats_reserved, capacity) 

where flight# is a key. The data base contains a collection of 
ground instances (facts) of the flight relation. To generate test 
cases, we specify the following categories of flights: 

category(flight,no_flight(X,Y,D)):- /* no flight from X to Y*l 

A={F# | flight(F#,X,Y,D, )}, count(A)=0. 

category(flight,singie(X,Y,D))> /* single flight from X to Y*i 
A={F# | flight(F#,X,Y l D, _,_)), count(A)=1 . 
category(flight,multiple(X 1 Y,D)):- /* multiple flight*/ 

A={F# | f!ight(F#,X,Y,D, )}, count(A) > 1 . 


category(flight,full(F#,X,Y,D)):- /* flight F# from X to Y is full*/ 
flight(F#,X,Y,D,S,C), count(S)=C. 
category(flight,available(F#,X,Y,D)):- /* flight is available 7 
flight(F#,X,Y,D,S,C), count(S) < C. 

Similarly, the categories of the output on a computer display are: 
category(output ,one Jine) . 
category(output, multiple Jines). 
category(output,message_1). 
category(output,message_2). 

(Note that we use the Prolog syntax for representing facts and 
rules, where a variable is written as a string beginning either with a 
capital letter or 


For each set of input facts (data) belonging to a certain combina- 
tion of categories, we specify the expected output. For this ex- 
ample, the input-output relationships specified in terms of 
categories are given as follows: 

(1) single & available ~> onejine. 

(2) multiple & available ~> multiplejines. 

(3) nojlight --> message_1 . 

(4) single & full -> message_2. 

(5) multiple & full ~> message^. 

Based upon these functional specifications, the test case gener- 
ator can generate the following test cases to cover different input 
scenarios: 

CASE 1A: Flight available (only flight to the city). 

EXPECTED RESULT: Display one line. 

CASE IB: Flight avilable (multiple flights to the city). 
EXPECTED RESULT: Display multiple lines. 

CASE 2: No flight. 

EXPECTED RESULT: Message 1 . 

CASE 3 A: No seats (only flight to the city). 

EXPECTED RESULT: Message 2. 

CASE 3B: No seats (multiple flights, all full). 

EXPECTED RESULT: Message 2. 

CASE 4: Flight available (one flight full, but another open). 
EXPECTED RESULT: Display lines and Message 2. 

Note that each of CASE 1A through 3B corresponds to one of the 
input-output relationships specified above. However, CASE 4 is 
generated by using the input-output relationships (2) and (4). This 
is possible because the conditions in the input-output relationships 
(2) and (4) are not mutually exclusive. 

STRUCTURE-BASED TESTING 

Another important source of test cases derives from the structure 
of a knowledge base, namely, the connection graph. The advan- 
tage of structure-based testing is that the generation of test cases 
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depends upon only the connection graph. It does not have to rely 
upon other information such as input-output specification of the 
system represented by the knowledge base. 

The basic concept in structure-based testing is one of complete 
coverage. The assumption is that every rule in the connection 
graph in some way serves some purpose for handling certain 
situations. Therefore, all the rules must be useful, i.e. , used some 
time, and the goal of structure-based testing is to generate a set of 
test cases to exercise every rule at least once. An algorithm for 
generating such test cases follows: 

(1) Generate the connection graph of the knowledge base. 

(2) Generate the rule flow diagram from the connection graph. 

Note that a rule flow diagram is a directed graph where nodes 

denote rules, and arcs denote rule execution sequences. 

(3) Create a set of paths in the flow diagram such that each 

node (rule) is covered by at least one path in the set. 

(4) Generate test cases to traverse these paths. 

Consider a rule-based system that computes the grade of a stu- 
dent from his answers to a quiz. The system compares his 
answers with the expected answers, counts the number of right 
answers, computes a numerical score, and then records the 
grade. His answers are represented by student(Name, Answers), 
and the expected answers are represented by 
expect(N_questions, Correct_answers). An instance of student 
and an instance of expect constitutes an input to the system. The 
rules for this system are given as follows: 


(1) grade(Name, Grade):- student(Name, Answers), 
expect{N_questions, CorrecLanswers), 
right_answers( Answers, Correct_answers, N_rights), 
Ratio is N_rights/N_questions, 

Score is Ratio* 100, 
compute_grade(Score, Grade). 



TABLE 1. TEST CASES FOR COMPLETE COVERAGE 

TEST CASES PA THS TRA VERSED 


student(john, [yes, yes]). 1,3, 4, 2, 8 

expect(2, [yes.no]). 


student(smith, [yes]). 1,3, 2, 5 

expect(1, [yes]). 


student{peter, [yes, yes, yes, no]). 1 ,4, 3, 3, 3, 2, 7 

expect(4, [no.yes.yes.no]). 


student(mary, [yes.no, yes, no, yes]). 1,3, 4, 3, 3, 3, 2, 6 
expect(5, [yes,yes,yes,no,yes]). 


(2) right_answers([],[].0). 

(3) right_answers([X|Y], [X|Z], R1):- 

right_answers{Y,Z,R), 

R1 is R+1. 

(4) right_answers(L|Y], [_\Z], R):- right_answers(Y,Z,R). 

(5) compute_grade(Score,a):- Score>=90. 

(6) compute_grade(Score,b):- Score<90, Score>=80. 

(7) compute_grade(Score,c):- Score<80, Score>=70. 

(8) compute_grade(Score,f):- Score<70. 

The rule flow diagram for these rules is shown in Figure 1. From 
the rule flow diagram, we can construct, for example, a set of 
paths, [1,3, 4, 2, 8], [1,3, 2, 5], [1 ,4, 3, 3, 3, 2, 7], and [1 ,3, 4, 3, 3, 3, 2, 6]. 
This set has a complete coverage of the rules, because every rule 
appears in the set at least once. For each path in the set, we 
collect all the conditions of the rules in the path, and find values 
that satisfy the conditions. If such values exist, then the path can 
be traversed, and the values can be used as a test case. The test 
cases for the paths are shown in Table 1 . 


DATA-BASED TESTING 

We now consider test cases that are derived from data definitions. 
Such test cases are called data-based test cases. Data definitions 
include data declarations and data constraints. In an expert sys- 
tem shell, data declarations are specified by data schema state- 


ments. Since maintaining the integrity of facts and rules in a 
knowledge base is important, we need also to specify the data 
constraints that the facts and rules must satisfy. Any fact or rule 
that violates the data constraint will not be inserted into the 
knowledge base. We can use logical formulas to represent the 
data constraints. 

By means of the data declarations and data constraints in the ex- 
pert system, we can generate good and bad test cases. A good 
test case satisfies the data declarations and data constraints and 
should be accepted by the expert system, while a bad test case 
violates them and should be rejected by the expert system. Be- 
cause the goal is to test the expert system with difficult examples, 
we should generate some extreme cases that barely satisfy or vio- 
late the data constraints, or contain large or small values. 

Consider input data on triangles specified by 
RELATIONAL SCHEMA: 

triangle(side1 :number, side2:number, side3:number). 

DATA CONSTRAINT: 

triangle(X,Y,Z) a 
X + Y > Z a 
X + Z > Y a 
Y + Z > X. 
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The data constraint says that the sum of any two sides of a tri- 
angle is greater than the remaining side. From the above data 
declaration and data constraint, we can generate the following ex- 
treme test cases. (Note that the first five test cases are bad, while 
the last two are good.) 


EXTREME TEST CASES COMMENTS 


triangle(1, 1,2) 
triangle<0, 0, 0) 
triangle(4, 0, 3) 
triangle(1 , 2, 3.00001) 
triangle(9170, 8942, 1) 
triangle(.0001, .0001, .0001) 
triangle(831 27, 74326, 96652) 


A straight line 
A point 
A zero side 
Close to a triangle 
Very small angle 
Very small triangle 
Very large triangle 


For an applicative system which takes an input and produces an 
output, a test case means a simulated instance of input and its 
expected output. However, for an imperative system that may al- 
ter data structures or produce side effects, just generating test 
cases of input is not enough. An imperative system can be 
represented by a state machine. There are a number of states. For 
each state, there are a certain number of actions that take the 
state into other states. For the state machine, a test case will be 
actually a test scenario that consists of an initial state, and a se- 
quence of specific actions. The goal is to check if bad states will 
be encountered when we run the state machine with the test 
scenario. We note that a bad state means that the state violates 
integrity constraints or a situation where no actions are available. 

Consider the following example: Container A can hold 5 gallons of 
water and container B 2 gallons of water. Initially, A is full and B is 
empty. Assume that water can be poured from A to B, and B to the 
drain. We would like to get to a final state where A is empty and B 
is half-full. The initial and final states are shown in Figure 2. 



Figure 2. Initial and Final States 


We use state(X,Y) to denote a state where X and Y are the 
amounts of water in containers A and B, respectively, and use 
pour(X,Y,Q) to denote an operation to pour Q gallons of water 
from X into Y. 

Let transition(Op,X,Y) denote that the operation Op changes state 
X to state Y, and let reach(Seq,X,Y ) denote that the sequence of 
operations, Seq, changes state X to state Y. 


The constraints on states and operations are specified as follows: 
pour(a,b,X) a 0<X<2. 
pour(b, drain, X) a 0 < X < 2. 
state(X.Y) a 0<X<5 a 0<Y<2. 

From the constraints, we can generate the following test scenario: 
state{5,0). initial state 

pour(a,b,2), pour(a,b,2). input sequence 

This is a bad test scenario because the second operation in the 
input sequence will cause container B to overflow. If we had the 
following knowledge base, 


(1) transition pour(a,b,Z), state(X.Y), state(U.V) ) :- 
Z > 0 A 


U=X-Z A 


V=Y+Z. 


(2) transition pour(b, drain, Y), state(X,Y), state(X,0) ) :- Y>0. 

(3) reach([Op],S1,S2) :- transition(Op,S1,$2). 

(4) reach([Op|Seq], Si, S3) :- 

transition(Op,S1,S2), 

reach(Seq,S2,S3). 

the bad scenario would be "successfully" processed, because 
Rule (1) is wrong. The correct version of the rule should be 

(V) transition pour(a,b,Z), state(X,Y), state{U,V) ) :- 

Z > 0 A 

U=X-Z a 


V=Y+Z a 
X>Z A 
V<2. 

The correct rule does make sure that container B will not overflow. 
CONCLUSION 

Test cases of input values to an expert system can be generated 
automatically. However, the expected output and performance for 
each test case may not be known, or not clearly defined, or stated 
in qualitative or narrative statements. In this case, the system's 
output and performance for the generated (simulated) test cases 
may have to be evaluated by independent human experts. The 
experts' evaluation results can be stored and used with the test 
cases again when the expert system is modified. 

We have described systematic ways for automatic test case 
generation. For large expert systems, this is essential because 
manual approaches are tedious and possibly biased. 

We have started work on implementing components of the test 
case generator. First, we will generate structure-based test cases 
because they do not depend upon specifications and 
metaknowledge. Then, we will consider data-based and finally 
function-based test cases. 
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